Gee Analysis of Clustered Binary Data with Diverging Number of Covariates
نویسندگان
چکیده
Clustered binary data with a large number of covariates have become increasingly more common in many scientific disciplines. This paper develops an asymptotic theory for generalized estimating equations (GEE) analysis of clustered binary data when the number of covariates grows to infinity with the number of clusters. In this “large n, diverging p” framework, we provide appropriate regularity conditions and establish the existence, consistency and asymptotic normality of the GEE estimator. Furthermore, we prove that the sandwich variance formula remains valid. Even when the working correlation matrix is misspecified, the use of the sandwich variance formula leads to asymptotically valid confidence interval and Wald test for an estimable linear combination of the unknown parameters. The accuracy of the asymptotic approximation is examined through numerical simulations. We also discuss the diverging p asymptotic theory for general GEE. The results in this paper extend recent elegant work of Xie and Yang (2003) and Balan and Schiopu-Kratina (2005) in the “fixed p” setting.
منابع مشابه
Generalized Additive Partial Linear Models for Clustered Data with Diverging Number of Covariates Using Gee
We study flexible modeling of clustered data using marginal generalized additive partial linear models with a diverging number of covariates. Generalized estimating equations are used to fit the model with the nonparametric functions being approximated by polynomial splines. We investigate the asymptotic properties in a “large n, diverging p” framework. More specifically, we establish the consi...
متن کاملMarginal modeling of multilevel binary data with time-varying covariates.
We propose and compare two approaches for regression analysis of multilevel binary data when clusters are not necessarily nested: a GEE method that relies on a working independence assumption coupled with a three-step method for obtaining empirical standard errors, and a likelihood-based method implemented using Bayesian computational techniques. Implications of time-varying endogenous covariat...
متن کاملModeling In Vitro Fertilization Data Considering Multiple Outcomes Observed among Iranian Infertile Women
Objective Women undergoing IVF cycles should go successfully through multiple points during the procedure (i.e., implantation, clinical pregnancy, no spontaneous abortion and delivery) to achieve live births. On the other there is a need to consider previous reproductive outcomes and as well as the current cycle. In this study, data on multiple cycles and multiple points during the IVF cycle ar...
متن کاملComparison of population-averaged and cluster-specific models for the analysis of cluster randomized trials with missing binary outcomes: a simulation study
BACKGROUND The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE)) and cluster-specific (i.e. random-effects logistic regression (RELR)) models for analyzing data from cluster randomized trials (CRTs) with missing binary responses. METHODS In this simulation study, clustered responses were generated ...
متن کامل